Towards Morphologically Annotated Corpus of Hospital Discharge Reports in Polish

نویسندگان

  • Malgorzata Marciniak
  • Agnieszka Mykowiecka
چکیده

The paper discuses problems in annotating a corpus containing Polish clinical data with low level linguistic information. We propose an approach to tokenization and automatic morphologic annotation of data that uses existing programs combined with a set of domain specific rules and vocabulary. Finally we present the results of manual verification of the annotation for a subset of data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some Remarks on Automatic Semantic Annotation of a Medical Corpus

In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.

متن کامل

Corpus-based Semantic Relatedness for the Construction of Polish WordNet

The construction of a wordnet, a labour-intensive enterprise, can be significantly assisted by automatic grouping of lexical material and discovery of lexical semantic relations. The objective is to ensure high quality of automatically acquired results before they are presented for lexicographers’ approval. We discuss a software tool that suggests synset members using a measure of semantic rela...

متن کامل

Towards 100M Morphologically Annotated Corpus of Tajik

The paper presents a work in progress: building morphologically annotated corpus of Tajik language of the size more than 100 million tokens. The corpus is and will be by far the largest available computer corpus of Tajik: even its current size is almost 85 million tokens. Because the available text sources are rather scarce, to achieve the goal also the texts of a lower quality have to be inclu...

متن کامل

Experiments in Cross-Language Morphological Annotation Transfer

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is...

متن کامل

Portable Language Technology: a Resource-light Approach to Morpho-syntactic Tagging

Morpho-syntactic tagging is the process of assigning part of speech (POS), case, number, gender, and other morphological information to each word in a corpus. Morpho-syntactic tagging is an important step in natural language processing. Corpora that have been morphologically tagged are very useful both for linguistic research, e.g. finding instances or frequencies of particular constructions in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011